Mining significant substructure-substructure pairs in structural associations
نویسندگان
چکیده
Biological and chemical objects in bioinformatics are often linked to structures: proteins and DNAs to their sequences, small-molecule compounds to their chemical structures, and pathways to their network structures. Associations among these objects, such as interactions between proteins and compounds, are of particular interest in the field, and these associations are often given as a graph: the objects are represented as nodes, and the associated pairs are connected by edges. To analyze these associations through relationships between the local substructures of individual objects, we propose a general framework for exactly enumerating substructure-substructure pairs significantly overrepresented in the “associated” pairs with edges rather than in all other “control” pairs with no edges. Analyzing relationships between local information have been successfully applied to sequences, table-form data, and texts. For example, co-evolution is a key biological concept for understanding protein interactions in terms of their sequences [1], co-modules which describe coherent patterns across paired data sets are developed to analyze the gene-expression and drug-response data [2], and co-occurrence of the words is one of fundamental tools for mining biomedical texts [3]. Thus, using our framework, applying this concept to the associations between more complicated structures such as trees and graphs would also give insights to understand underlying biological problems. The proposed framework can be applied to various types of association datasets with structures (sequences, trees, and graphs): interaction between compounds and proteins, and side-effect associations between compounds. In this work, we analyze the interactions between drugs and targets, i.e., the structural associations between chemical structures of drug compounds and amino-acid sequences of target proteins.
منابع مشابه
Methods for ‘ Mining significant substructure pairs for interpreting polypharmacology in drug - target network ’
This supplement provides detailed description on computational methods used in the work, consisting of mainly four parts: 1) mining frequent subgraph-subsequence pairs (Section 1), 2) evaluating the significance of generated substructure pairs (Section 2), 3) generating the GRASP fingerprint of an arbitrary given compound-protein pair (Section 3), and 4) comparing two compound-protein pairs in ...
متن کاملLearning Co-Substructures by Kernel Dependence Maximization
Modeling associations between items in a dataset is a problem that is frequently encountered in data and knowledge mining research. Most previous studies have simply applied a predefined fixed pattern for extracting the substructure of each item pair and then analyzed the associations between these substructures. Using such fixed patterns may not, however, capture the significant association. W...
متن کاملMining Significant Substructure Pairs for Interpreting Polypharmacology in Drug-Target Network
A current key feature in drug-target network is that drugs often bind to multiple targets, known as polypharmacology or drug promiscuity. Recent literature has indicated that relatively small fragments in both drugs and targets are crucial in forming polypharmacology. We hypothesize that principles behind polypharmacology are embedded in paired fragments in molecular graphs and amino acid seque...
متن کاملGraph-Based Hierarchical Conceptual Clustering
Hierarchical conceptual clustering has proven to be a useful, although under-explored, data mining technique. A graph-based representation of structural information combined with a substructure discovery technique has been shown to be successful in knowledge discovery. The SUBDUE substructure discovery system provides one such combination of approaches. This work presents SUBDUE and the develop...
متن کاملA New Approach to Protein Structure Mining and Alignment
One of the largest areas of bioinformatic and data mining research has been in the protein domain. These efforts have included protein structure prediction, folding pathway prediction, sequence alignment, ab initio simulation, structure alignment, substructure detection and many others. Substructure detection is generally defined as the mining of a molecule’s 3D structure in order to find inter...
متن کامل